Ontario County
- Europe > France (0.15)
- Europe > United Kingdom (0.14)
- North America > Canada > Ontario > Toronto (0.14)
- (28 more...)
- Transportation > Passenger (1.00)
- Transportation > Marine (1.00)
- Leisure & Entertainment > Sports > Football (1.00)
- (4 more...)
Tree Search for LLM Agent Reinforcement Learning
Ji, Yuxiang, Ma, Ziyu, Wang, Yong, Chen, Guanhua, Chu, Xiangxiang, Wu, Liaoni
Recent advances in reinforcement learning (RL) have significantly enhanced the agentic capabilities of large language models (LLMs). In long-term and multi-turn agent tasks, existing approaches driven solely by outcome rewards often suffer from the problem of sparse supervision. To address the challenge, we propose Tree-based Group Relative Policy Optimization (Tree-GRPO), a grouped agent RL method based on tree search, where each tree node represents the complete agent interaction step. By sharing common prefixes, the tree search sampling increases the number of rollouts achievable within a fixed budget of tokens or tool calls. Moreover, we find that the tree-structured trajectory naturally allows the construction of step-wise process supervised signals even using only the outcome reward. Based on this, Tree-GRPO estimates the grouped relative advantages both on intra-tree and inter-tree levels. Through theoretical analysis, we demonstrate that the objective of intra-tree level group relative policy optimization is equivalent to that of step-level direct preference learning. Experiments across 11 datasets and 3 types of QA tasks demonstrate the superiority of the proposed tree-based RL over the chain-based RL method.Figure 1: Comparison of chain-based and tree-based sampling strategies in LLM multi-turn agent RL. The tree structure brings two major advantages: (i) less rollout budget (both on tokens and tool-calls); (ii) higher performance. Reinforcement Learning (RL) has emerged as a pivotal post-training paradigm for Large Language Models (LLMs), catalyzing the development of several frontier models (DeepSeek-AI Team, 2025; Y ang et al., 2025a; OpenAI, 2024). RL-tuned LLMs trained only with outcome rewards acquire complex reasoning abilities and achieve remarkable gains in single-turn tasks, such as mathematical proof and code generation (Team et al., 2025b; Y u et al., 2025; Chu et al., 2025a; Shao et al., 2024; Xin et al., 2024). This suggests that LLMs can learn not only through static imitation, but also by actively interacting with dynamic environments. Guided by this prospect, recent works have extended this RL paradigm to more complex agent settings involving dynamic, multi-turn interactions (Feng et al., 2025b; Singh et al., 2025; Wang et al., 2025b; Qian et al., 2025; Feng et al., Work done during internship at AMAP, Alibaba Group. Right (Ours): Tree search with nodes corresponding to complete agent step.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- Southern Ocean (0.04)
- North America > Canada > Ontario > Lambton County > Sarnia (0.04)
- (11 more...)
- Leisure & Entertainment > Sports > Soccer (1.00)
- Media > Music (0.68)
- Leisure & Entertainment > Sports > Football (0.68)
- Europe > France (0.15)
- Europe > United Kingdom (0.14)
- North America > Canada > Ontario > Toronto (0.14)
- (28 more...)
- Transportation > Passenger (1.00)
- Transportation > Marine (1.00)
- Leisure & Entertainment > Sports > Football (1.00)
- (4 more...)
Can synthetic data reproduce real-world findings in epidemiology? A replication study using tree-based generative AI
Kapar, Jan, Günther, Kathrin, Vallis, Lori Ann, Berger, Klaus, Binder, Nadine, Brenner, Hermann, Castell, Stefanie, Fischer, Beate, Harth, Volker, Holleczek, Bernd, Intemann, Timm, Ittermann, Till, Karch, André, Keil, Thomas, Krist, Lilian, Lange, Berit, Leitzmann, Michael F., Nimptsch, Katharina, Obi, Nadia, Pigeot, Iris, Pischon, Tobias, Schikowski, Tamara, Schmidt, Börge, Schmidt, Carsten Oliver, Sedlmair, Anja M., Tanoey, Justine, Wienbergen, Harm, Wienke, Andreas, Wigmann, Claudia, Wright, Marvin N.
Generative artificial intelligence for synthetic data generation holds substantial potential to address practical challenges in epidemiology. However, many current methods suffer from limited quality, high computational demands, and complexity for non-experts. Furthermore, common evaluation strategies for synthetic data often fail to directly reflect statistical utility. Against this background, a critical underexplored question is whether synthetic data can reliably reproduce key findings from epidemiological research. We propose the use of adversarial random forests (ARF) as an efficient and convenient method for synthesizing tabular epidemiological data. To evaluate its performance, we replicated statistical analyses from six epidemiological publications and compared original with synthetic results. These publications cover blood pressure, anthropometry, myocardial infarction, accelerometry, loneliness, and diabetes, based on data from the German National Cohort (NAKO Gesundheitsstudie), the Bremen STEMI Registry U45 Study, and the Guelph Family Health Study. Additionally, we assessed the impact of dimensionality and variable complexity on synthesis quality by limiting datasets to variables relevant for individual analyses, including necessary derivations. Across all replicated original studies, results from multiple synthetic data replications consistently aligned with original findings. Even for datasets with relatively low sample size-to-dimensionality ratios, the replication outcomes closely matched the original results across various descriptive and inferential analyses. Reducing dimensionality and pre-deriving variables further enhanced both quality and stability of the results.
- Europe > Germany > Bremen > Bremen (0.14)
- Europe > Germany > Bavaria > Regensburg (0.04)
- Europe > Germany > Baden-Württemberg > Freiburg (0.04)
- (9 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Research Report > Strength Medium (0.67)
- Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
- Health & Medicine > Epidemiology (1.00)
- Health & Medicine > Therapeutic Area > Oncology (0.92)
- (2 more...)
Automatic Construction of a Large-Scale Corpus for Geoparsing Using Wikipedia Hyperlinks
Ohno, Keyaki, Kameko, Hirotaka, Shirai, Keisuke, Nishimura, Taichi, Mori, Shinsuke
Geoparsing is the task of estimating the latitude and longitude (coordinates) of location expressions in texts. Geoparsing must deal with the ambiguity of the expressions that indicate multiple locations with the same notation. For evaluating geoparsing systems, several corpora have been proposed in previous work. However, these corpora are small-scale and suffer from the coverage of location expressions on general domains. In this paper, we propose Wikipedia Hyperlink-based Location Linking (WHLL), a novel method to construct a large-scale corpus for geoparsing from Wikipedia articles. WHLL leverages hyperlinks in Wikipedia to annotate multiple location expressions with coordinates. With this method, we constructed the WHLL corpus, a new large-scale corpus for geoparsing. The WHLL corpus consists of 1.3M articles, each containing about 7.8 unique location expressions. 45.6% of location expressions are ambiguous and refer to more than one location with the same notation. In each article, location expressions of the article title and those hyperlinks to other articles are assigned with coordinates. By utilizing hyperlinks, we can accurately assign location expressions with coordinates even with ambiguous location expressions in the texts. Experimental results show that there remains room for improvement by disambiguating location expressions.
- North America > Canada > Ontario (0.07)
- Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.06)
- North America > United States > New York > Ontario County (0.05)
- (5 more...)
Fine-Grained Human Feedback Gives Better Rewards for Language Model Training
Wu, Zeqiu, Hu, Yushi, Shi, Weijia, Dziri, Nouha, Suhr, Alane, Ammanabrolu, Prithviraj, Smith, Noah A., Ostendorf, Mari, Hajishirzi, Hannaneh
Language models (LMs) often exhibit undesirable text generation behaviors, including generating false, toxic, or irrelevant outputs. Reinforcement learning from human feedback (RLHF) - where human preference judgments on LM outputs are transformed into a learning signal - has recently shown promise in addressing these issues. However, such holistic feedback conveys limited information on long text outputs; it does not indicate which aspects of the outputs influenced user preference; e.g., which parts contain what type(s) of errors. In this paper, we use fine-grained human feedback (e.g., which sentence is false, which sub-sentence is irrelevant) as an explicit training signal. We introduce Fine-Grained RLHF, a framework that enables training and learning from reward functions that are fine-grained in two respects: (1) density, providing a reward after every segment (e.g., a sentence) is generated; and (2) incorporating multiple reward models associated with different feedback types (e.g., factual incorrectness, irrelevance, and information incompleteness). We conduct experiments on detoxification and long-form question answering to illustrate how learning with such reward functions leads to improved performance, supported by both automatic and human evaluation. Additionally, we show that LM behaviors can be customized using different combinations of fine-grained reward models. We release all data, collected human feedback, and codes at https://FineGrainedRLHF.github.io.
- Europe > France (0.15)
- Europe > United Kingdom (0.14)
- North America > Canada > Ontario > Toronto (0.14)
- (28 more...)
- Transportation > Passenger (1.00)
- Transportation > Marine (1.00)
- Leisure & Entertainment > Sports > Football (1.00)
- (4 more...)
Forecasting COVID-19 Case Counts Based on 2020 Ontario Data
Silver, Daniel L., Digamarthi, Rinda
Objective: To develop machine learning models that can predict the number of COVID-19 cases per day given the last 14 days of environmental and mobility data. Approach: COVID-19 data from four counties around Toronto, Ontario, were used. Data were prepared into daily records containing the number of new COVID case counts, patient demographic data, outdoor weather variables, indoor environment factors, and human movement based on cell mobility and public health restrictions. This data was analyzed to determine the most important variables and their interactions. Predictive models were developed using CNN and LSTM deep neural network approaches. A 5-fold chronological cross-validation approach used these methods to develop predictive models using data from Mar 1 to Oct 14 2020, and test them on data covering Oct 15 to Dec 24 2020. Results: The best LSTM models forecasted tomorrow's daily COVID case counts with 90.7% accuracy, and the 7-day rolling average COVID case counts with 98.1% accuracy using independent test data. The best models to forecast the next 7 days of daily COVID case counts did so with 79.4% accuracy over all days. Models forecasting the 7-day rolling average case counts had a mean accuracy of 83.6% on the same test set. Conclusions: Our findings point to the importance of indoor humidity for the transmission of a virus such as COVID-19. During the coldest portions of the year, when humans spend greater amounts of time indoors or in vehicles, air quality drops within buildings, most significantly indoor relative humidity levels. Moderate to high indoor temperatures coupled with low IRH (below 20%) create conditions where viral transmission is more likely because water vapour ejected from an infected person's mouth can remain longer in the air because of evaporation and dry skin conditions, particularly in a recipient's airway, promotes transmission.
- North America > Canada > Ontario > Toronto (0.25)
- North America > United States > California > San Bernardino County > Ontario (0.04)
- North America > Canada > Nova Scotia > Halifax Regional Municipality > Dartmouth (0.04)
- (7 more...)
- Research Report > Experimental Study (0.92)
- Research Report > New Finding (0.66)
Restaurants are using AI to guess what you want to eat
One day soon, a menu may judge you. You'll walk up to a kiosk in a quick service restaurant and a tiny camera will scan your features, registering your height, age, gender, and mood. Instantly, it will adjust its display, selecting meal options picked just for you. Once you've ordered and moved on, the person behind you will step into the menu's gaze, and the process will start again. This is the idea behind new software from Raydiant, a San Francisco-based software company that plans to roll out its AI-driven kiosks by the end of this year.
- North America > United States > California > San Francisco County > San Francisco (0.25)
- North America > United States > New York > Ontario County (0.05)
- North America > Canada > Ontario (0.05)
- Information Technology (0.92)
- Consumer Products & Services > Restaurants (0.72)
A Model of the Mechanisms Underlying Exploratory Behaviour
Gabora, Liane, Colgan, Patrick
A model of the mechanisms underlying exploratory behaviour, based on empirical research and refined using a computer simulation, is presented. The behaviour of killifish from two lakes, one with killifish predators and one without, was compared in the laboratory. Plotting average activity in a novel environment versus time resulted in an inverted-U-shaped curve for both groups; however, the curve for killifish from the lake without predators was (1) steeper, (2) reached a peak value earlier, (S) reached a higher peak value, and (4) subsumed less area than the curve for killifish from the lake with predators. We hypothesize that the shape of the exploration curve reflects a competition between motivational subsystems that excite and inhibit exploratory behaviour in a way that is tuned to match the affordance probabilities of the animal's environment. A computer implementation of this model produced curves which differed along the same four dimensions as differentiate the two killifish curves. All four differences were reproduced in the model by tuning a single parameter: the time-dependent component of the decay-rate of the exploration-inhibiting subsystem.
- North America > Canada > Ontario > National Capital Region > Ottawa (0.14)
- North America > United States > New York > Ontario County (0.05)
- North America > Canada > Ontario > Kingston (0.04)
- (5 more...)